The genome sequence of the Plain Longtail butterfly, Spicauda simplicius (Stoll, 1807)

We present a genome assembly from an individual female Spicauda simplicius (the Plain Longtail butterfly; Arthropoda; Insecta; Lepidoptera; Hesperiidae). The genome sequence is 610.1 megabases in span. Most of the assembly is scaffolded into 32 chromosomal pseudomolecules, including the Z and W sex chromosomes. The mitochondrial genome has also been assembled and is 15.54 kilobases in length. Gene annotation of this assembly on Ensembl identified 18,506 protein coding genes.


Background
Spicauda simplicius (Plain Longtail) is a butterfly of the family Hesperiidae with a neotropical distribution, ranging from northern Mexico to northern Argentina (Evans, 1951).The common name refers to its cryptic colouration (it is plain brown and lacks any white apical forewing patches found in congeneric species) and elongated hind wing tails (Figure 1).The species commonly co-occurs in habitats with other widespread Spicauda species (e.g., S. tanna, S. teleus, S. procne).The species lacks any clearly defined sexual dimorphism.
Spicauda simplicius is a common species in disturbed environments, with strays being reported as far north as Texas (Rickard, 1977) and a single individual being reported from California (Tilden, 1976).It is absent from the Caribbean islands except Trinidad and Tobago (Cock, 1982) and has recently become established in Grenada (Lewis et al., 2012), after being found only as an isolated individual (Smith et al., 1994).

Genome sequence report
The genome was sequenced from one female Spicauda simplicius (Figure 1) collected from Tarapoto, San Martin, Peru (-6.49, -76.36).A total of 23-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 97 missing joins or mis-joins and removed 21 haplotypic duplications, reducing the assembly length by 1.41% and the scaffold number by 14.78%.
The final assembly has a total length of 610.1 Mb in 172 sequence scaffolds with a scaffold N50 of 21.0 Mb (Table 1).The snail plot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.31%) of the assembly sequence was assigned to 32 chromosomal-level scaffolds, representing 30 autosomes and the Z and W sex chromosomes.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).The W chromosome could not be scaffolded, as the Hi-C data were from a male specimen.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Genome annotation report
The Spicauda simplicius genome assembly (GCA_949699795.1)was annotated at the European Bioinformatics Institute (EBI) on Ensembl Rapid Release.The resulting annotation includes 18,688 transcribed mRNAs from 18,506 protein-coding      Table 3. Software tools: versions and sources.

Software tool Version
instrument.Hi-C data were also generated from whole organism tissue of ilUrbSimp8 using the Arima v2 kit.The Hi-C sequencing was performed using paired-end sequencing with a read length of 150 bp on the Illumina NovaSeq 6000 instrument.

Genome assembly and curation
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2023).The assembly was checked for contamination and corrected as described previously (Howe et al., 2021).
Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and PretextView (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2023), which runs MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.
Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in the default protein mode to generate annotation for the Spicauda simplicius assembly (GCA_949699795.1) in Ensembl Rapid Release at the EBI.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Jerome H L Hui
The Chinese University of Hong Kong, Hong Kong, Hong Kong In this data note, Ribeiro and colleagues obtained the genomic resource of a female plain longtail butterfly Spicauda simplicius (Lepidoptera; Hesperiidae).The author stated that Spicauda simplicius as (Stoll, 1807), while other studies have been referring Spicauda simplicius as (Stoll, 1790)(e.g.TAXREF v17.0, https://inpn.mnhn.fr/espece/cd_nom/985422). I have also tried to look up and dig into different information, nevertheless, I did not have access to some crucial studies during the draft of this report to rectify this issue.I hope the authors can ensure that they are using the right one, especially when many in the future will refer to this data note when using this genomic resource.
Prior to this study, there are limited molecular data available for this species deposited on the NCBI database.As of August 2024, this species remains unassessed by IUCN.This genome resource is important and will be very useful for further studies, such as understanding the ecological, evolutionary, and genomics questions related to lepidopterans and other insects more widely.
This genome resource is excellent according to the summary statistics, with high BUSCO number scores, high sequence continuity (scaffold N50), and majority of sequences contained on the 30 pseudochromosomes (plus 2 sex chromosomes and mitochondrion).All in all, this is a valuable contribution.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: I have published with Peter Holland more than three years ago, and confirm that this potential conflict of interest did not affect my ability to write an objective and unbiased review of the article.
Reviewer Expertise: Genomics, evolution, invertebrates I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Elena Pazhenkova
University of Ljubljana, Ljubljana, Slovenia The article presents a comprehensive genome assembly of the Plain Longtail butterfly, Spicauda simplicius (Lepidoptera, Hesperiidae), a species with a neotropical distribution from northern Mexico to northern Argentina.The authors constructed a high-quality genome assembly using data from Pacific Biosciences single-molecule HiFi long reads and Hi-C data.The resulting genome sequence spans 610 Mb assigned to 32 chromosomal pseudomolecules, representing 30 autosomes and the Z and W sex chromosomes.Additionally, the mitochondrial genome has been assembled and included as a contig in the genome assembly.Gene annotation and manual curation refined the assembly by correcting 97 missing joins or misjoins and removing 21 haplotypic duplications, resulting in a highly accurate and well-resolved genome sequence.The study provides detailed taxonomic information, habitat preferences, and host plant interactions, enhancing the ecological and evolutionary context of the research.Overall, this article offers a valuable resource for researchers interested in the biology and genomics of Spicauda simplicius.The genome assembly presented in this study serves as a robust foundation for future investigations into the species' adaptation, population structure, and evolutionary history.
Is the rationale for creating the dataset(s) clearly described?Yes

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of methods and materials provided to allow replication by others?

Figure 1 .
Figure 1.Photograph of a Spicauda simplicius specimen collected in the same locality as the sequenced specimen.

Figure 2 .Figure 3 .
Figure 2. Genome assembly of Spicauda simplicius, ilUrbSimp4.1:metrics.The BlobToolKit snail plot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 610,070,898 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (27,661,612 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (21,027,237 and 14,173,452 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the lepidoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Urbanus%20simplicius/dataset/ilUrbSimp4_1/snail.

Figure 4 .
Figure 4. Genome assembly of Spicauda simplicius, ilUrbSimp4.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all sequences.Coloured lines show cumulative lengths of sequences assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Urbanus%20simplicius/dataset/ilUrbSimp4_1/cumulative.

Figure 5 .
Figure 5. Genome assembly of Spicauda simplicius, ilUrbSimp4.1:Hi-C contact map of the ilUrbSimp4.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=dRd7w2eUTVmD4XwxPqvi8A.

Open Peer Review Current Peer Review Status: Version 1
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.